Attentional Pooling for Action Recognition

نویسندگان

  • Rohit Girdhar
  • Deva Ramanan
چکیده

We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks. Our proposed attention module can be trained with or without extra supervision, and gives a sizable boost in accuracy while keeping the network size and computational cost nearly the same. It leads to significant improvements over state of the art base architecture on three standard action recognition benchmarks across still images and videos, and establishes new state of the art on MPII dataset with 12.5% relative improvement. We also perform an extensive analysis of our attention module both empirically and analytically. In terms of the latter, we introduce a novel derivation of bottom-up and top-down attention as low-rank approximations of bilinear pooling methods (typically used for fine-grained classification). From this perspective, our attention formulation suggests a novel characterization of action recognition as a fine-grained recognition problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Attentional Focus on Gaze Behavior and Accuracy of Dart Throwing: The Attentional Task Demands Problem

Focus of Attention and Quiet Eye (QE) of the affecting variables on aiming task performance in recent decades have always been interesting for psychologist and sport science researchers. The purpose of this study was to investigate the effectiveness of attention instructions on gaze behavior and accuracy of dart throwing of novice in low and high task load. In a semi-experimental design with re...

متن کامل

Order-aware Convolutional Pooling for Video Based Action Recognition

Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame. The pooling methods that they adopt, however, usually completely or partially neglect the dynamic information contained in the temporal domain, which may undermine the discriminative power of the resulting video representation since the video sequence ...

متن کامل

Second-order Temporal Pooling for Action Recognition

Most successful deep learning models for action recognition generate predictions for short video clips, which are later aggregated into a longer time-frame action descriptor by computing a statistic over these predictions. Zeroth (max) or first order (average) statistic are commonly used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel e...

متن کامل

Eigen Evolution Pooling for Human Action Recognition

We introduce Eigen Evolution Pooling, an efficient method to aggregate a sequence of feature vectors. Eigen evolution pooling is designed to produce compact feature representations for a sequence of feature vectors, while maximally preserving as much information about the sequence as possible, especially the temporal evolution of the features over time. Eigen evolution pooling is a general pool...

متن کامل

تاثیر پردازش کلی چهره ای بر سو گیری توجه نسبت به چهره های هیجانی در کودکان مضطرب

This study was performed to examine the effect of holistic face processing and trait anxiety on children’s attentional biases toward schematic natural and jumbled emotional faces (angry, happy, neutral). The participants were entered into study considering their scores in Trait anxiety inventory for children (Spielberger, 1973) and the results of a semi-structured interview. 30 high-and 30 low ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017